Adaptive bimodal sensor fusion for automatic speechreading

نویسندگان

Uwe Meier

Wolfgang Hürst

Paul Duchnowski

چکیده

We present recent work on improving the performance of automated speech recognizers by using additional visual in formation Lip Speechreading achieving error reduction of up to This paper focuses on di erent methods of combining the visual and acoustic data to improve the recognition performance We show this on an extension of an existing state of the art speech recognition system a modular MS TDNN We have developed adaptive combi nation methods at several levels of the recognition network Additional information such as estimated signal to noise ra tio SNR is used in some cases The results of the di er ent combination methods are shown for clean speech and data with arti cial noise white music motor The new combination methods adapt automatically to varying noise conditions making hand tuned parameters unnecessary

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach to Self-Localization for Mobile Robots Using Sensor Data Fusion

This paper proposes a new approach for calibration of dead reckoning process. Using the well-known UMBmark (University of Michigan Benchmark) is not sufficient for a desirable calibration of dead reckoning. Besides, existing calibration methods usually require explicit measurement of actual motion of the robot. Some recent methods use the smart encoder trailer or long range finder sensors such ...

متن کامل

Large-vocabulary audio-visual speech recognition by machines and humans

We compare automatic recognition with human perception of audio-visual speech, in the large-vocabulary, continuous speech recognition (LVCSR) domain. Specifically, we study the benefit of the visual modality for both machines and humans, when combined with audio degraded by speech-babble noise at various signal-to-noise ratios (SNRs). We first consider an automatic speechreading system with a p...

متن کامل

Bilingual corpus for AVASR using multiple sensors and depth information

In this paper we present the Bilingual Audio-Visual Corpus with Depth information (BAVCD). The database contains utterances of connected digits, spoken by 15 subjects in English and 6 subjects in Greek, and collected employing multiple audio-visual sensors. Among them, of particular interest is the use of the Microsoft Kinect device, which is able to capture facial depth images using the struct...

متن کامل

Designing a Home Security System using Sensor Data Fusion with DST and DSMT Methods

Today due to the importance and necessity of implementing security systems in homes and other buildings, systems with higher certainty, lower cost and with sensor fusion methods are more attractive, as an applicable and high performance methods for the researchers. In this paper, the application of Dempster-Shafer evidential theory and also the newer, more general one Dezert-Smarandache theory ...

متن کامل

Exploiting lower face symmetry in appearance-based automatic speechreading

Appearance-based visual speech feature extraction is being widely used in the automatic speechreading and audio-visual speech recognition literature. In its most common application, the discrete cosine transform (DCT) is utilized to compress the image of the speaker’s mouth region-of-interest (ROI), and the highest energy spatial frequency components are retained as visual features. Good genera...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1996

Adaptive bimodal sensor fusion for automatic speechreading

نویسندگان

چکیده

منابع مشابه

A New Approach to Self-Localization for Mobile Robots Using Sensor Data Fusion

Large-vocabulary audio-visual speech recognition by machines and humans

Bilingual corpus for AVASR using multiple sensors and depth information

Designing a Home Security System using Sensor Data Fusion with DST and DSMT Methods

Exploiting lower face symmetry in appearance-based automatic speechreading

عنوان ژورنال:

اشتراک گذاری